tffiSf) TM-?33 7i> 


NASA -Technical Memorandum 83376 


NASA-TM-83376 19830016196 


Efficient Solution of the Euler and Navier- 
Stokes Equations With a Vectorized 
Multiple-Grid Algorithm 


*• . V; \ : ■ 

Rodrick V. Chima and Gary M. Johnson 
Lewis Research Center 
Cleveland, Ohio 



!: 


p 


/;vir 

iwOO 


LANGLEY RESEARCH CENTER 
LIBRARY, NASA 
HAMPTON, VIRGINIA 


Prepared for the 

Sixth Computational Fluid Dynamics Conference 

sponsored by the American Institute of Aeronautics and Astronautics 

Danvers, Massachusetts, July 13-15, 1983 






E-1644 


EFFICIENT SOLUTION OF THE EULER AND NAVIER-STOKES EQUATIONS 
WITH A VECTORIZED MULTIPLE-GRID ALGORITHM 

Rodrick V. Chima and Gary M. Johnson 

National Aeronautics and Space Administration 
Lewis Research Center 
Cleveland, Ohio 44135 


ABSTRACT 

A multiple-grid algorithm for use in efficiently 
obtaining steady solutions to the Euler and Navier- 
Stokes equations is presented. The convergence of 
the explicit MacCormack algorithm on a fine grid is 
accelerated by propagating transients from the 
domain using a sequence of successively coarser 
grids. Both the fine- and coarse-grid schemes are 
readily vectorizable. The combination of multiple- 
gridding and vectorization results in substantially 
reduced computational times for the numerical solu- 
tion of a wide range of flow problems. Results are 
presented for subsonic, transonic, and supersonic 
inviscid flows and for subsonic attached and sepa- 
rated laminar viscous flows. Work reduction factors 
over a scalar, single-grid algorithm range as high 
as 76.8. 


solution. As the multiple-grid scheme retains the 
explicit nature of the underlying fine-grid proce- 
dure, it may be vectorized in a straightforward 
manner. This leads to a further reduction in com- 
putational work. In combination, multiple-gridding 
and vectorization produce a quite efficient explicit 
algorithm for the solution of both inviscid and 
viscous flow problems. 

EQUATIONS OF MOTION 

The nondimensional equations of motion may be 
written in conservation law form as 

q t = -(F x + y (1) 

where, for the full Navier-Stokes equations. 


INTRODUCTION 
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Steady solutions to both the Euler and Navier- 
Stokes equations are commonly computed as temporal 
asymptotes to the unsteady equations of motion with 
steady boundary conditions applied. This is done 
because the unsteady equations are either purely 
hyperbolic, in the case of the Euler equations, or 
hyperbolic - parabolic, for the Navier-Stokes equa- 
tions, and are thus amenable to numerical solution 
by robust time-marching procedures. Furthermore, 
such procedures are relatively easy to implement 
and, to the extent that the computation is time- 
accurate, allow physical interpretation of the con- 
vergence history. 

Relaxation procedures for solving the steady, 
compressible versions of the Euler or Navier-Stokes 
equations are a topic of current research and have 
not as yet resulted in production algorithms. 

Both explicit and implicit time-marching proce- 
dures are presently in widespread use. The explicit 
methods are simple, easily vectorizable and allow a 
good deal of flexibility in the treatment of bound- 
ary conditions. Their largest shortcoming lies in 
their conditional stability, which may place rather 
severe limitations on the time step size permissible 
on a given grid. When only a steady solution is 
sought and the accurate resolution of transients is 
of no consequence, explicit methods may consequently 
exhibit poor convergence rates. 

Implicit methods are one possible remedy to the 
slow convergence of explicit schemes. These meth- 
ods, at the expense of a higher operations count, 
are generally unconditionally linearly stable and 
hopefully permit time steps to be taken as large as 
is consistent with accuracy requirements. In prac- 
tice, large time steps may excite nonlinear 
instabilities, and the choice of boundary condition 
implementation may introduce a stability limit. 

The present work maintains the advantages of an 
explicit procedure while using a multiple-grid con- 
vergence acceleration scheme to substantially 
improve the convergence rate at relatively low com- 
putational expense, in terms of increased operations 
count. This results in a net reduction in the com- 
putational work required to produce a converged 
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Here p, u, v, p, a and E are respectively 
density, velocity components in the x- and y- 
directions, pressure, sound speed and total energy 
per unit volume. The total energy per unit volume 
may be expressed as 

E = p(e +^(u 2 + v 2 )) 

where the specific internal energy, e, is related 
to the pressure and density by the simple law of a 
calorically perfect gas 

P - (y - l)pe 

with y denoting the ratio of specific heats. 

The coefficient of thermal conductivity, k, and 
the viscosity coefficients, x and \i , are assumed 
to be functions only of temperature. Furthermore, 
by invoking Stokes' assumption of zero bulk 
viscosity, x may be expressed in terms of the 
dynamic viscosity u as 



Specification of initial and boundary condi- 
tions complete the formulation of the problem. 
Initial conditions are specified as uniform flow 
at the isentropic Mach number implied by the ratio 
of exit static pressure to inlet total pressure. 
Boundary conditions are specified as follows: At 

the inlet, total temperature, total pressure, and 
flow angle are specified, while at the exit the 
static pressure is specified. Along solid surfaces 
the tangency condition is applied for inviscid 
flow, or the no-slip condition is applied and the 
temperature is specified for viscous flow. 

FINE-GRID SOLUTION PROCEDURE 

The fine-grid integration scheme employed in 
this work is the two-step Lax-Wendroff method known 
as the MacCormack 4 scheme. Schemes of Lax-Wendroff 
type may be arrived at intuitively by using 
Taylor's theorem to write the approximation: 


6q = At q t + — q tt (2) 

where we define the "correction" to q such that 
sq e q(t + At) - q(t ) 

Since we seek solutions to Eq. (1), time deriva- 
tives may be expressed as space derivatives: 


Re and Pr denote the Reynolds and Prandtl numbers, 
respectively. 

Although, for simplicity, the equations of 
motion are presented here written in Cartesian 
coordinates, Viviand^ has shown that their strong 
conservation law form may be maintained under an 
arbitrary time-dependent transformation of coordi- 
nates. Explicit detail concerning the generalized 
coordinate version of these equations has been 
provided by Steger 2 and need not be repeated 
here. 

We note that the thin-layer approximation, in 
the words of Baldwin and Lomax 2 , "... evolves 
directly from a realistic assessment of what is 
really being computed in a typical high Reynolds 
number Navier-Stokes simulation." A highly 
stretched mesh is used to resolve the large flow 
gradients normal to the vorticity-generating sur- 
face. Consequently, because of limitations on 
computer capacity, the diffusion terms involving 
derivatives parallel to the surface are not re- 
solved well enough to merit their computation. 

Similar viscous terms are also neglected in 
the classical boundary layer approximation. How- 
ever, while the boundary layer approximation 
replaces the normal momentum equation with the 
assumption that the normal pressure gradient is 
zero across the viscous layer, all momentum equa- 
tions are retained in the thin-layer approximation 
and no assumptions are made concerning the pres- 
sure. Consequently, the separation point is not a 
singularity of the thin-layer model equations nor 
do the problems associated with matching a boundary 
layer solution to an inviscid outer flow occur 
when they are used. 

In practice, the thin-layer assumption is im- 
plemented by using a body-fitted coordinate system 
and neglecting the viscous terms in the coordinate 
direction along the body. For Cartesian coordi- 
nates, with x representing the body-conforming 
coordinate, the thin-layer version of the Navier- 
Stokes equations is as given above. 
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where A and B are the Jacobian matrices (see 
Steger 2 for details): 

A = aF / a q B e aG / aq 

Substitution into Eq. (2) results in: 
sq = -At(F x + G y ) 


At 


-{[A(F x + G y )] + [B(F x + G y )] } (3) 


Second-order accurate spatial discretization of 
Eq. (3) then yields a one-step Lax-Wendroff method. 


Ni 's Method 


Prior to discussing MacCormack 's scheme, we 
derive the fine-grid solution procedure used by 
Ni‘ 5. This is a necessary prerequisite to the 
development of the coarse-grid acceleration scheme. 

If we make the following finite-volume type 
approximations: 
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it then follows that a discrete approximation to 
the first-order term in Eq. (3) may be written as: 




(5) 


Consistent with the above approximations and def- 
inition, we may write the approximation: 


_At(F x + G V ) . 1 . 1 = Aq . 1 • 1 


This motivates the definitions: 
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Eqs. (4) and (7) constitute the one-step Lax- 
Wendroff method used as a basid integration scheme 
by Ni. He gives the following heuristic interpre- 
tation to these equations: the first calculates 

the change, Aq, occurring in a control volume 
during the increment At while the second distri- 
butes the effects of the changes occurring in four 
nearest-neighbor control volumes to their common 
central nodal point where they are combined to 
form the correction, 6q, to the vector of conser- 
vation variables, as illustrated in Fig. 1. This 
interpretation motivates the construction of the 
coarse-grid acceleration scheme to be discussed 
subsequently. 

Notice that Ni 1 s scheme may also be thought of 
as a two-step scheme with a full time increment 
predictor defined by Eq. (4) and a corrector de- 
fined by Eq. (7). However, such an interpretation 
would not be consistent with the general practice 
of avoiding the computation of Jacobian matrices 
in two-step schemes. 

MacCormack's Method 

Following Richtmyer^ many two-step Lax- 
Wendroff schemes have been developed. They have 
superseded the one-step schemes by virtue of their 
lower operations counts. MacCormack's method is a 
particularly popular and efficient member of this 
class. The forward predictor - backward corrector 
version of this method may be written as 
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If we then approximate the second-order terms in 
Eq. (3) as: 
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First derivatives in the viscous terms are backward 
differenced in the predictor and forward differ- 
enced in the corrector. 

A second-order dissipative term is used to 
enforce the entropy condition across shocks in the 
supercritical results presented later. Second 
derivatives in the dissipation term are multiplied 
by the absolute value of the density gradient. 

This improves shock resolution but makes the dis- 
sipation first-order in the vicinity of shocks. 
Dissipation was not used for any of the subcritical 
results. 

This approach to solving fluid flow problems 
is quite robust and has been in widespread and 
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successful use for some time, both for the time- 
accurate computation of unsteady flow and for the 
time-asymptotic solution of steady flow problems. 

In the latter case, where accurate resolution of 
physical transients is ; not required, the numerical 
stability limitation inherent in this explicit 
method may severely restrict the speed of its con- 
vergence to the steady state. Providing a method 
to accelerate convergence in this case is one ob- 
jective of this work. 

COARSE-GRID ACCELERATION SCHEME 

Given the fine-grid corrections, which may be 
computed by any one- or two-step tax-Wendroff 
scheme, as shown in Johnson?, we wish to use 
successively coarser grids to propagate these cor- 
rections throughout the computational domain, thus 
accelerating convergence to the steady state while 
maintaining the accuracy determined by the fine- 
grid discretization. Define a fine grid such that 
the number of points in each direction is expres- 
sible as n(2P) + 1 for p and n integers such 
that p _> 0 is the number of grid coarsenings, 
and n > 2 is the number of intervals on the 
coarsest grid. Then successively coarser grids 
can be defined by successive deletion of every 
other point in each coordinate direction. 

Full Coarse-Grid Scheme 

The full coarse-grid acceleration scheme, as 
illustrated in Fig. 2, replaces the computation of 
coarse grid changes using Eq. (4) with a restric- 
tion of the latest fine-grid correction. This 
restricted fine-grid correction is then distributed 
according to a coarse-grid version of Eq. (7) to 
obtain a coarse-grid correction. This in turn is 
prolonged to the fine grid to become the new fine- 
grid correction. One time-cycle of the multiple- 
grid scheme is composed of an application of some 
Lax-Wendroff scheme on the fine grid followed by 
an application of the coarse-grid solution proce- 
dure to each successively coarser grid. The flow 

of information in this process is depicted in 
Fig. 3. Boundary conditions are updated only 
during the fine-grid computation. This decouples 
the coarse-grid acceleration scheme from the de- 
tails of boundary condition implementation. 

In the basic integration scheme, a change at 
one grid point affects only its nearest neighbors 
while, in a k-level multiple grid scheme, the same 
change affects all points up to 2 k_1 mesh 
spacings distant. Furthermore, since the change 
is always determined by information from the fine 
grid and simply propagated by the distribution 
formulae for coarser grids, fine grid accuracy is 
maintained. This concept for convergence acceler- 
ation was introduced by Ni for use in conjunction 
with his one-step Lax-Wendroff scheme, as described 
above. He illustrated its utility by solving the 
homoenthalpic two-dimensional Euler equations. 

Convective Coarse-Grid Scheme 

In Johnson 8 , consideration of the physical 
processes being modelled in a viscous flow compu- 
tation led to the formulation of an alternative 
coarse-grid scheme. Dissipative effects have a 
local character and their influence need not be 
taken into account in the construction of coarse- 
grid distribution formulae. Rather, it is the 
convective terms, with their global character, 
which are the key element in coarse-grid propaga- 
tion. Hence, a coarse-grid scheme for viscous 


flow computations may be formulated on the basis 
of the inviscid equations of motion. Such a con- 
vective coarse-grid scheme is inherently more ef- 
ficient than the full coarse-grid scheme because 
of the diminished computational effort associated 
with forming the Jacobian matrices of the Euler 
flux vectors rather than those of the viscous flux 
vectors. An additional benefit is that the con- 
vective coarse-grid scheme leads to a multiple- 
grid convergence acceleration procedure which is 
independent of the nature of the dissipative terms 
retained in the viscous model equations. That is 
to say that the coarse-grid scheme based on the 
Euler equations may be employed, without modifica- 
tion, to accelerate the convergence of viscous 
flow computations based on the Navier-Stokes equa- 
tions, the thin-layer equations, or any other 
viscous model equations which contain the full 
inviscid Euler equations. This claim is supported 
by the computational results presented subsequently. 

VECTORIZATION 

The original multiple-grid MacCormack code was 
developed on the IBM 370/3033 computer at the NASA 
Lewis Research Center. This is a scalar machine 
which runs at about one-half the speed of a CDC 
7600. All scalar timings presented in this paper 
were obtained on the Lewis 370 with full compiler 
optimization. 

It is well known that explicit algorithms in 
general and MacCormack 's algorithm in particular 
are highly amenable to vectorization (eg. Shang, 
Buning, Hankey and Wirth"). As the multiple-grid 
scheme presented here is explicit, it may also be 
vectorized in a straightforward manner. Indeed, we 
have vectorized the original research code for use 
on the Cray 1-S computer recently installed at NASA 
Lewis and have decreased its execution time by fac- 
tors ranging from 11.2 to 13.3 over the scalar code. 
The changes made to the code can be grouped into 
the five catagories listed below. 

1. Unrolling short inner DO loops over the 
four conservation variables. This change allows 
vectorization over longer outer loops. It also 
eliminates considerable loop overhead, thereby im- 
proving scalar performance. 

2. Revising DO loop ordering to make innermost 
loops the longest. The code is thus vectorized 
over grid rows. We remark that both the base Mac- 
Cormack algorithm and the multiple-grid scheme can 
be vectorized over the entire domain to obtain much 
longer vectors. While this would be desirable for 
a Cyber 205, it does not seem to be worth the added 
programming complexity for the Cray. Note also 
that making the innermost loop the longest lowers 
the paging rate on the virtual storage IBM machine. 

3. Removing IF statements and subroutine calls 
from vectorizable loops. 

4. Storing metric invariants used in a second- 
order damping term for the supercritical cases. 

This would be impractical in a storage-limited 3-D 
code. 

5. Addition of Cray compiler directives to 
ignore possible recursion due to ambiguous sub- 
scripts in the multiple-grid subroutine. To allow 
for several grid levels this subroutine is pro- 
grammed to sweep the grid with a variable stride 
equal to 2**(IGD-1), where IGD is the grid level. 

The Cray compiler does not know the value of this 
stride a priori, and, to avoid possible recursion, 
will not vectorize loops with an ambiguous stride. 

We know that the multiple-grid scheme is not re- 
cursive and direct the Cray compiler to ignore the 
possible vector dependencies. Cray compiler di- 
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recti ves start with a "C" in column one and look 
like comments to the IBM compiler. 

The multiple-grid scheme, as presently con- 
structed, requires the use of grids of length 
n(2P) + 1 for n and p integers as described 
earlier. Since boundary conditions are computed 
separately from interior points, fine-grid vectors 
are of length n(2P) - 1 and coarse-grid vectors 
are of length n(2P -k+ l) - 1, where k > 2 is the 
grid number. For the cases presented later, n = 4 
and p = 4 (in the x-di recti on) giving a fine-grid 
vector length of 63, which is a near^optimal vector 
length for the Cray. Coarse-grid vector lengths 
can be much shorter, and consequently coarse-grid 
calculations do not vectorize as efficiently as 
fine-grid calculations. Nevertheless, the coarse- 
grid calculations benefit from vectorization as 
long as the vector length remains greater than about 
four. 

Note that a Cray vector is defined by a starting 
location, a length and a stride through memory, 
making the multiple-grid scheme readily vectorizable 
on the Cray. However on the Cyber 205, vectors are 
defined only by a starting location and length, with 
elements assumed to be in contiguous memory loca- 
tions. This would make the multiple-grid scheme 
more difficult to vectorize on a 205. 

The vectorized multiple-grid code is completely 
machine independent and has been run on both the 
IBM 370/3033 and the Cray 1-S, in 64-bit precision 
on each, to produce the results in this paper. To 
illustrate the performance increase due to vector- 
ization, we present the following timing results 
for a typical inviscid supercritical flow case. 
First, the vectorized code runs 1.77 times faster 
on the IBM than the original code did, due strictly 
to scalar efficiency improvements. Second, in a 
scalar mode on the Cray the code runs 5.82 times 
faster on one grid and 5.31 times faster on three 
grids than on the IBM, reflecting mostly the clock 
times of the two computers (12.5 ns for the Cray, 

60 ns for the IBM). Third, in vector mode on the 
Cray the code runs 2.36 times faster on one grid 
and 2.15 times faster on three grids than in scalar 
mode. Averaged over several cases we find that the 
vectorized Cray code runs 13.3 times faster on one 
grid and 11.2 times faster on three grids than the 
scalar IBM code. Again, the relative difference in 
speed-up due to vectorization between the single- 
and multiple-grid codes is due primarily to the 
short vector lengths on the coarse grids. It may 
be possible to improve the vectorization of the 
multiple-grid algorithm by restructuring the storage 
of the coarse-grid data. 

COMPUTATIONAL RESULTS 

We report on a sampling of the computations 
performed thus far with the vectorized, multiple- 
gridded MacCormack algorithm. The full Euler equa- 
tions have been solved across the entire spectrum 
of subsonic, transonic and supersonic flow. The 
thin-layer version of the Navier-Stokes equations 
has been solved for both attached and separated 
laminar subsonic flows. All computations have been 
performed in two dimensions. Extension of these 
results to the full Navier-Stokes equations, tur- 
bulent flow or three dimensions presents no essen- 
tial difficulties. Results for attached and 
separated turbulent subsonic flows are reported in 

Johnson. 10 

For each case presented here we define a 
multiple-grid work reduction factor to be the ratio 
of the work required to produce a converged solution 


on a single fine grid to the work required to pro- 
duce the same result on a sequence of grids. 

Inviscid Flow 

Ultimately we intend to use the multiple-grid 
MacCormack algorithm to compute flows in turbo- 
machinery cascades. To investigate the robustness 
of the algorithm we have computed the flow about 
the cascade of bicircular arc airfoils shown in 
Fig. 4 over a wide range of flow conditions, from 
low speed to choked. We will reference the dif- 
ferent flow conditions by the nominal Mach number 
implied by the imposed isentropic static-to-total 
pressure ratio. 

All inviscid computations were made using the 
65 x 17 node fine grid as shown in Fig. 5 and the 
coarse grids as indicated in Table I. All computa- 
tions have been run to convergence on a single grid 
and on a three-grid sequence, on both the IBM 
370/3033 and the Cray 1-S. We emphasize that the 
converged single- and multiple-grid solutions are 
identical. The computations are considered to be 
converged when the average unsealed residual in pu 
drops below 10“ 4 , a decrease of approximately 
three decades. Detailed timing and convergence 
data are presented in Table II and will be dis- 
cussed subsequently. 

Fig. 6 shows computed points on a mass flow 
versus pressure ratio operating curve for the 
cascade. The largest error in integrated mass 
flow along any grid line in any of the cases was 
0.38 percent. The computed points are compared to 
the 1-D isentropic theory. The 1-D choking pres- 
sure ratio is 0.73503 while the computed 2-D 
choking pressure ratio has been bracketed between 
0.70155 and 0.72093. Details of the flows at the 
four right-most points on the curve are shown in 
Figs. 7 through 10. 

Low-speed results at M = 0.2 are shown in 
Fig. 7. Fig. 7(a) shows isomachs while Fig. 7(b) 
shows Mach number distributions on the body and 
symmetry lines. Figs. 7(c) and 7(d) show con- 
vergence histories on 1 and 3 grids, respectively. 
The low-speed solution converges extremely slowly 
on a single grid, taking 5740 time cycles. On 
three grids the solution converges in 780 time 
cycles, with a net multiple-grid work reduction 
factor of 4.92 on the IBM and 4.02 on the Cray. 

Subcritical results at M = 0.5 are shown in 
Fig. 8. This case has been well documented in 
Johnson. 7 The single-grid solution converges in 
4300 time cycles while on three grids the solution 
converges in 710 time cycles. The multiple-grid 
work reduction factor is 3.92 on the IBM and 3.31 
on the Cray. 

Supercritical results at M = 0.675 are shown 
in Fig. 9. This case has also been well documented 
in Johnson. 7 The single-grid solution converges 
in 2310 time cycles while on three grids the solu- 
tion converges in 830 time cycles. The multiple- 
grid work reduction factor is 2.01 on the IBM and 
1.67 on the Cray. 

Choked results at M = 0.73 are shown in 
Fig. 10. Of particular interest is the greatly 
increased single-grid convergence rate over that 
for the supercritical but unchoked case shown in 
Fig. 9. The choked single-grid solution converges 
in 1350 time cycles, which is almost twice as fast 
as the unchoked solution. On three grids the 
choked solution converges in 710 time cycles but, 
because the single-grid convergence rate is high, 
the multiple-grid work reduction factor is only 
1.38 on the IBM and 1.14 on the Cray. 
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By fixing all inlet conditions at M = 1.6 
(below the choking mass flow) and by extrapolating 
all exit conditions, the same cascade was run as a 
supersonic inlet diffuser. These results are shown 
in Fig. 11. Isomachs, shown in Fig. 11(a), show a 
strong oblique shock leaving the leading edge. 

The leading-edge shock intersects the upper sym- 
metry plane as a normal shock which reflects 
downward to intersect with another oblique shock 
leaving the trailing edge. Surface Mach numbers, 
shown in Fig. 11(b), show some overshoot before 
the normal shock on the symmetry plane, indicating 
too little damping. The single-grid convergence 
history in Fig. 11(c) is not as steep as the choked 
case in Fig. 10(c), converging here in 1840 time 
cycles. The three-grid history in Fig. 11(d) shows 
convergence in 950 time cycles. The multiple-grid 
work reduction factor is 1.38 on the IBM and 1.21 
on the Cray. 

Subcritical flow through a similar cascade but 
with a 40 percent blade thickness and subject to a 
linear inlet shear profile is shown in Fig. 12. 

The inlet profile has a nominal Mach number of 0.5 
at the top and 0.1 at the bottom. Isomachs in 
Fig. 12(a) show near-perfect left-to-right sym- 
metry. Surface Mach numbers are shown in 
Fig. 12(b). The single-grid convergence rate in 
Fig. 12(c) is slow, as are the other subcritical 
cases, converging in 3030 time cycles. On three 
grids the solution converges in 1410 time cycles. 
The multiple-grid work reduction factor is 1.42 on 
the IBM and 1.18 on the Cray. 

Detailed timing and convergence data for the 
six inviscid cases are presented in Table 11(a). 
Note that times to convergence on the Cray ranged 
from 7.3 to 14.5 seconds. Multiple-grid work re- 
duction factors are summarized in Table 11(b). 

The most interesting result in this table is that 
the multiple-grid work reduction factor is greatest 
for the lowest speed flows. However, in all cases 
the increased computational work of the multiple- 
grid scheme was more than offset by the improved 
convergence rate. 

Viscous Flow 

We first consider the subsonic flow through a 
cascade of unstaggered flat plates at zero angle 
of attack, as illustrated in Fig. 13. The ratio 
of exit static pressure to upstream total pressure 
is 0.8430191, yielding flow Mach numbers in the 
vicinity of 0.5 for the test cases to be exhibited 
here. The Reynolds numbers, based on cascade gap 
and critical speed, span the approximate range 
from 8.4 X 10^ to 2.0 X 10®. Symmetry is 
invoked to limit the size of the computational 
domain and the flow is assumed to be laminar. 

These assumptions are made for convenience in 
specifying the number and location of fine-grid 
nodal points and do not imply limitations on the 
generality of the method. The choice of boundary 
conditions is also indicated in Fig. 13. 

As illustrated in Fig. 14, three different 
fine grids are employed in this study. All have 
the same number of nodal points (65 x 33) and have 
their transverse grid lines located at the same 
positions. They differ in the positioning of their 
lateral grid lines. These are smoothly stretched 
away from the solid boundary in a geometric pro- 
gression, starting from the initial spacings in- 
dicated in Fig. 14. These fine grids each allow 
the construction of four successively coarser 
grids, as indicated in Table III. The members of 
these grid families may then be used in combination 


to form multiple-grid sequences of length one 
through five. 

Computations have been performed for the com- 
binations of Reynolds number and fine grid con- 
figuration indicated in Table IV. Isomach contours 
for the converged solutions produced for each case 
are shown in Fig. 15. The contour levels displayed 
are not equally spaced and are the same for all 
five cases shown. Nevertheless, they provide a 
good qualitative indication of the nature of the 
computed flowfields. More quantitative information 
is provided by the normalized u-velocity profiles, 
which are also illustrated in Fig. 15. The 
u-velocity, normalized with its value at the top 
boundary and same streamwise station, is plotted 
as a function of relative distance from the bottom 
boundary. Curves for every second streamwise sta- 
tion, starting with the plate leading edge and 
ending at the outflow boundary, are displayed. 

They are staggered in proportion to the spacing of 
their respective streamwise stations. The data 
displayed in Fig. 15 are for conditions of optimal 
work reduction, as indicated in Table IV. However, 
in each case the solution obtained is not a func- 
tion of grid sequence length. This was verified 
by extensive computational experimentation. 

Convergence histories are also shown in 
Fig. 15. For each cast, we display the fine grid 
convergence history and the corresponding conver- 
gence history for that grid sequence length which 
produced the best work reduction factor. Several 
of the plots have been truncated to fit within the 
residual range displayed. 

For all three test cases and the five possible 
grid sequence lengths, the computational work re- 
quired to reduce a standard error measure to a 
specified tolerance has been recorded. In each 
case, based on these data, we have estimated a 
multiple-grid work reduction factor for the cor- 
responding optimal grid-sequence length. The 
results obtained are recorded in Table IV. Multiple- 
grid work reduction factors ranging from 1.8 to 4.8 
have been realized. We note that Johnson® ob- 
served that although the work reduction factor 
and, possibly, the optimal grid sequence length 
decrease with increasing grid stretching, they do 
not appear to decrease with increasing Reynolds 
number. 

Further viscous flow computations have been 
performed in a cascade where the flat plates were 
replaced by sting-mounted bicircular-arc airfoils, 
as depicted in Fig. 16. Here, the grids used were 
simply sheared versions of those used previously 
in the flat plate test cases. The combinations of 
Reynolds number, thickness-to-chord ratio and grid 
configuration tested are listed in Table V. Iso- 
mach contours, normalized u-velocity profiles and 
convergence histories for these cases are shown in 
Fig. 17. The work reduction factors attained 
ranged from 1.5 to 3.8 and are reported in Table V. 

The viscous results presented in this report 
were run as scalar computations on the IBM 
370/3033. A vectorized version of the viscous 
codehas been benchmarked on the Cray 1-S to de- 
termine the work reduction attainable through vec- 
torization for viscous cases. By combining the 
multiple-grid work reduction factor with the work 
reduction attained by vectorizing the algorithm, 
we have determined that the overall work reduction 
for the viscous flow cases ranged as high as 76.8. 

Several issues bearing on the multiple-grid 
work reduction factor should be mentioned at this 
juncture. When one considers convergence acceler- 
ation of the turbulent full Navier-Stokes equa- 
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tions, greater work reduction than obtained here 
will result by virtue of the inclusion of the full 
viscous terms and turbulence modelling on the fine 

grid. The treatment of the Jacobian matrices used 
in the coarse-grid acceleration scheme has a large 
influence on the efficiency of the coarse-grid 
computations and, hence, on the work reduction 
factor. Substantial improvement appears to be 
possible over the current treatment of these 
Oacobians. In the present computations, injection 
is used as the restriction operator and linear 
interpolation is chosen as the prolongation oper- 
ator. These choices may not be optimal for use on 
highly stretched grids. Better choices could in- 
crease both the optimal grid sequence length and 
the work reduction factor. Similar consequences 
might result from an alternative coarse-grid for- 
mation strategy. 

Given the encouraging results obtained to date, 
more comprehensive testing and more sophisticated 
applications of the inviscid and viscous flow con- 
vergence acceleration and vectorization ideas pre- 
sented here are planned. 

CONCLUSIONS 

A vectorized, machine-independent, explicit 
multiple-grid algorithm for the efficient solution 
of the steady Euler and Navier-Stokes equations 
has been presented. 

The coarse-grid scheme used to accelerate con- 
vergence is compatible with all one- and two-step 
Lax-Wendroff algorithms. Here, it has been used 
in conjunction with MacCormack's method. 

The convective version of the coarse-grid 
scheme may be used with any set of flow equations 
in the hierarchy ranging from the Euler equations 
to the full Navier-Stokes equations. Here, we 
have used it with the thin-layer version of the 
Navier-Stokes equations. 

Computational results have been presented for 
subsonic, transonic and supersonic inviscid flows 
and for separated and attached, laminar, subsonic 
viscous flows. 

Vectorization of the algorithm proved rela- 
tively straightforward and, in conjunction with 
multiple-gridding, yielded overall work reduction 
factors ranging from 15.3 to 76.8, over a fairly 
broad range of flow conditions. 
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TABLE |. - INVISCID FLOW GRID DESCRIPTIONS 


GRID 

1 

2 

3 

NUMBER OF 
POINTS 

65x17 

33x9 

17x5 


TABLE lla. - TIMING AND CONVERGENCE DATA FOR INVISCID FLOW CASES 


CASE 

FIG. 

MACHINE 

TIME FOR 5K CYC. 
(sec) 

CYC. TO CONVERGE 

TIME TO CONVERGE 
(sec) 




T1GRID 

T3GRIDS 

N1GRID 

N3GRIDS 

tlGRID 

t3GRIDS 

M = .2 

7 

IBM 

368.9 

551.5 

H 

780 

420.2 

86.0 

LOW SPEED 

l 

CRAY 

28.0 

51.2 

1 

780 

32.2 

8.0 

M •= .5 


IBM 

370.7 

572.4 

4300 

710 

318.8 

81.3 

SUBCRITICAL 

CRAY 

28.0 

51.2 

4300 

710 

81.3 

7.3 

M = . 675 


IBM 

481.6 

665.2 

mm 

830 

222.5 

110.4 

SUPER CRITICAL 

7 

CRAY 

35.1 

58.3 

wm 

830 

16.2 

9.7 

M = .73 

10 

IBM 

478.7 

658.3 

1350 

710 

129.2 

93.5 

CHOKED 

CRAY 

35.1 

58.3 

1350 

710 

9.5 

8.3 

M- 1.6 

11 

IBM 

467.0 

654.7 

1840 

950 

171.9 

124.3 

INLET DIFFUSER 

CRAY 

36.1 

58.1 

1840 

950 

12.8 

11.0 

M ■ .1 - .5 

12 

IBM 

366.9 

555.1 

3030 


222.3 

156.5 

INLET SHEAR 

CRAY 

28.0 

51.2 

3030 


17.0 

14.5 


















TABLE Mb. - MULTIPLE-GRID WORK REDUCTION FACTORS FOR INVISCID FLOW CASES 


CASE 

FIG. 

MACHINE 

(ADDED WORK)' 1 

CONVERGENCE 

WORK REDUCTION 





RATE INC. 

FACTOR 




T1GRID/T3GRIDS 

N1GRID/N3GRIDS 

T1GRID/T3GRIDS 

M - .2 

7 

IBM 

.669 

7. 36 

4.92 

LOW SPEED 

CRAY 

.547 

7. 36 

4.02 

M - .5 

8 

IBM 

.648 

6.06 

3.92 

SUBCRITICAL 

CRAY 

.547 

6.06 

3.31 

M = .675 

Q 

IBM 

.724 

2.78 

2.01 

SUPERCRITICAL 

7 

CRAY 

.602 

2.78 

1.67 

M = .73 

10 

IBM 

.727 

1.90 

1.38 

CHOKED 

CRAY 

.602 

1.90 

1.14 

M ■ 1.6 

11 

IBM 

.713 

1.94 

1.38 

INLET DIFFUSER 

CRAY 

.621 

1.94 

1.21 

M-.1-.5 

12 

IBM 

.661 

2.15 

1.42 

INLET SHEAR 

CRAY 

.647 

2.15 

1.18 


TABLE III. - VISCOUS FLOW GRID DESCRIPTIONS 


GRID 

1 

2 

3 

4 

5 

NUMBER OF 
POINTS 

65x33 

33 x 17 

17 x 9 

9x5 

5x3 





TABLE IV. - VISCOUS FLAT PLATE TEST CASES 


TEST CASE 

REYNOLDS 

NUMBER 

INITIAL 

TRANSVERSE 

SPACING 

OPTIMAL 

SEQUENCE 

LENGTH 

WORK 

REDUCTION 

FACTOR 

a 

8. 4 x 10 3 

0.0125 

5 

4.8 

b 

3. 4 x 10 4 

0.00625 

2 OR 3 

3.0 

c 

2.0X10 5 

0.00250 

3 

1.8 


TABLE V. - VISCOUS BICIRCULAR ARC TEST CASES 


TEST CASE 

REYNOLDS 

NUMBER 

THICKNESS 

CHORD 

INITIAL 

TRANSVERSE 

SPACING 

OPTIMAL 

SEQUENCE 

LENGTH 

MULTIPLE-GRID 
WORK REDUCTION 
FACTOR 

a 

8. 4 x 10 3 

0.100 


3 

3.8 

b 

3.4 x 10 4 

0.100 


2 

1.8 

c 

3.4 x 10 4 

0.050 


2 

2.5 

d 

2.0 xlO 5 

0.025 

1 

2 

1.5 

























R - RESTRICTION OF LATEST FINE-GRID 6q AS COARSE- 
GRID Aq 

P - PROLONGATION OF COARSE-GRID 6q TO FINE GRID 



GRID 

1 (FINEST) 

2 

3 


N (COARSEST) 

Figure 3. - Multiple-grid information flow. 
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Figure 4. - Inviscid bicircular arc cascade problem. 
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(c) Convergence history, lgrid. (d) Convergence history, 3 grids. 

Figure 12. - Inviscid bicircular arc cascade with linear inlet shear. 
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Figure 13. - Viscous flat plate cascade problem, 




SMALLEST GRID SPACING: Ax - 0.03125 

Ay - 0. 00625 



SMALLEST GRID SPACING: Ax = 0.03125 

Ay = 0.0025 


Figure 14. - Viscous flow fine grid configurations. 
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Figure 15. - Viscous flat plate cascade results. 
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I iyure 15. - Continued. 
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(c) Re - 2.0x10 s . 
Figure 15. - Concluded. 
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Figure 16. - Viscous bicircular arc cascade problem. 
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